How model inversion attack works?
- Attacker first trains a separate ML model known as Inversion Model based on the output of the target model
- Goal is to predict the input data (original dataset used to train the target model)
- Attacker can exploit information based on the input
Types of MIA attacks:
- Query based attacks: exploit the model’s query interface to extract information about the training data
- Membership Inference Attacks: determine whether a particular data sampel was part of the model’s training dataset
- Generative Model-based attacks: Generative adversarial networks (GAN) used to invert the target model
Prevention Methods
- Input and output masking – encrypt both model input and output
- Secure multi-party computation (SMPC) – multiple parties work together on a task involving their private data without revealing that dat to each other
- Differential privacy – adding noise to data to protect individual privacy
- Federated learning – model training occurs across decentralized servers, only aggregated model updates are communicated to central model
Model Extraction Attack
Adversary learns a close approximation of the model f using as few queries as possible. Extraction attack reveals model architecture or parameters and Oracle Attack enables the adversary to construct a substitute model.
What are the implications?
- Undermine pay-for-prediction pricing model
- Facilitate privacy attacks
- Facilitate black-box attacks
How it works
Extract data from a model. Asks lots of questions, gets a lot of data to be used to create a dataset that would be used to train a substitute model.
Defenses against model extraction is difficult.
- Proactive attack prevention
- Limit information gained by adversary
- Budget limit / throttling / quota
- If providing confidence scores, round them up
- Add noise to model weights while preserving overall accuracy of model
- Reactive
- Anomaly detection via query pattern / signature analysis
- Unusal queries
- Model watermarking
- Embed unique identifiers or watermarks in model
- Anomaly detection via query pattern / signature analysis
Membership Inference Attacks
Given a query input x and blackbox access to the target model F, the membership inference attack answers the question of whether X E D is true or false. The attack is successfu if the attacker can determine with high confidence that x is contained within the dataset D used to train F.
Infer whether x is used to train the model, what is the privacy concern?
- Assume f can predict cancer-related health outcomes
- If x is used to train f, x may have health issues.
Goal – to see if a particular data was part of the training dataset. The attacker is trying to expose the dataset. This is actually not unique to ML. This can be used anywhere a dataset is being used.
Introduction
- Objective: To assess the privacy risks of machine learning models, specifically their vulnerability to membership inference attacks.
- Approach: The authors develop an attack model that distinguishes between the target model’s outputs on training data and on unseen data.
- Key Insight: The attack exploits differences in the behavior of machine learning models on their training inputs compared to new inputs.
- Applications: Evaluates models created by commercial services (e.g., Google Prediction API and Amazon ML) using sensitive datasets like hospital discharge records.
2. Attack Methodology
- Black-Box Setting: The attacker only has query access to the model (e.g., API access) and does not know the model’s architecture or parameters.
- Shadow Models:
- To train the attack model, the authors create “shadow models” that mimic the target model’s behavior.
- These shadow models are trained on known datasets, allowing the attack model to learn patterns indicative of training set membership.
- Training Data for Shadow Models:
- Model-Based Synthesis: Uses the target model’s confidence scores to generate synthetic data similar to the training set.
- Statistical Information: Generates synthetic data based on known population statistics.
- Noisy Real Data: Uses approximations of the training dataset with added noise.
- Attack Model: A binary classifier predicts whether an input was part of the training set based on the target model’s prediction vectors.
3. Evaluation
- Datasets Used:
- CIFAR (image recognition), Purchases (shopping data), Locations (Foursquare check-ins), Texas Hospital Stays (healthcare data), MNIST (handwritten digits), and UCI Adult Income.
- Target Models:
- Includes models from Google Prediction API, Amazon ML, and locally trained neural networks.
- Performance:
- Membership inference achieved high accuracy, especially on overfitted models and datasets with many output classes.
- Examples:
- Google Prediction API: 94% median precision for retail transactions dataset.
- Texas hospital stays: Over 70% precision for predicting training data membership.
4. Findings
- Factors Influencing Vulnerability:
- Overfitting: Overfitted models leak more information about their training data.
- Model Complexity: Models with many output classes are more susceptible.
- Dataset Characteristics: Smaller and less diverse datasets increase vulnerability.
- Adversary’s Knowledge:
- Even without prior knowledge of the training data distribution, attacks were effective using synthetic or noisy data for shadow models.
5. Mitigation Strategies
- Proposed Defenses:
- Restrict Prediction Output: Limit the prediction vector to the top-k classes or reduce its precision.
- Regularization: Techniques like L2 regularization reduce overfitting and information leakage.
- Increase Prediction Entropy: Use techniques such as temperature scaling to make outputs less distinct.
- Differential Privacy: Adds noise to training to protect against inference attacks but may reduce model accuracy.
- Effectiveness:
- Mitigation strategies reduced attack precision but did not completely prevent membership inference.
6. Comparison with Related Work
- Distinguishes membership inference from model inversion attacks, which infer sensitive attributes of inputs rather than their membership in the training set.
- Highlights limitations of model extraction attacks, which target model parameters rather than training data.
7. Practical Implications
- Privacy Risks: Membership in datasets such as healthcare records can reveal sensitive information (e.g., a person’s medical history).
- Machine Learning Services:
- Platforms like Google and Amazon need to provide transparency about the risks of using their models.
- They should incorporate privacy-preserving practices during model training and expose metrics for measuring leakage.
8. Conclusions
- The study demonstrates that machine learning models can leak sensitive training data through their outputs.
- The shadow training technique introduced is effective even with minimal knowledge of the target model.
- Calls for improved privacy protection in machine learning, emphasizing the importance of designing models resilient to membership inference.
Summary of “Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures”
This paper explores a class of Model Inversion (MI) Attacks that exploit confidence information revealed by machine learning (ML) models. The authors investigate the privacy risks associated with ML-as-a-Service platforms, using these attacks to infer sensitive information from decision trees and neural networks. They propose countermeasures to mitigate these attacks, highlighting challenges and potential solutions for improving ML model privacy.
Key Sections and Findings
1. Introduction
- Context: Machine learning models are extensively used in privacy-sensitive domains like lifestyle prediction, medical diagnoses, and facial recognition. ML-as-a-Service platforms further expose these models to external users.
- Threat: Model Inversion Attacks exploit APIs to infer sensitive information from training data.
- Contribution: The authors develop advanced MI attacks using confidence values and demonstrate them in two contexts:
- Lifestyle survey decision trees: Infer responses like marital infidelity.
- Facial recognition neural networks: Reconstruct recognizable images of individuals.
- Countermeasures: They explore methods like rounding confidence values and training adjustments to enhance privacy.
2. Background
- Model Inversion Overview:
- ML models map input features to output predictions, often revealing confidence levels.
- These confidence values can be exploited to infer sensitive features.
- Threat Model:
- Black-box attacks: Adversary queries the model without accessing its internal parameters.
- White-box attacks: Adversary has access to model parameters and training details.
3. Fredrikson et al.’s MI Algorithm
- Evaluated the limitations of an earlier MI attack on small-domain features like genomic markers.
- Found high false positive rates when applied to decision trees for sensitive data like survey responses.
4. Decision Tree MI Attacks
- White-Box Enhancements:
- New estimator uses training set counts to improve MI precision.
- Achieves perfect precision for training set participants, with no false positives.
- Black-Box Attacks:
- Adapted earlier algorithms to decision tree settings but found lower precision.
- Case Studies:
- Lifestyle survey (“Have you cheated?”):
- White-box attacks showed 593× improved precision over black-box.
- GSS survey (“Have you watched X-rated movies?”):
- Similar trends with perfect precision for white-box attacks.
- Lifestyle survey (“Have you cheated?”):
5. Facial Recognition MI Attacks
- Objective: Reconstruct facial images from model confidence scores.
- Algorithms:
- Gradient-based optimization maximizes confidence scores for a target label.
- Variants for three neural network architectures:
- Softmax regression: Sharp reconstructions with 75% accuracy.
- Multilayer Perceptron (MLP): Lower accuracy due to blurred images.
- Stacked Denoising Autoencoder (DAE): Further reduced accuracy but faster reconstruction.
- Experiments:
- Mechanical Turk studies showed human participants could identify reconstructed faces with 75–87% accuracy.
6. Countermeasures
- For Decision Trees:
- Adjust sensitive feature depth in tree structure to reduce MI attack effectiveness.
- Optimal placement achieved high classification accuracy with limited inversion risk.
- For Neural Networks:
- Rounding confidence values significantly degraded attack success:
- Attack failed completely at a rounding level of 0.1.
- Rounding confidence values significantly degraded attack success:
- Future Directions:
- Integration of privacy-preserving metrics into training algorithms.
- Development of robust privacy-aware model architectures.
7. Related Work
- Highlights past research on:
- Linear reconstruction attacks.
- Privacy vulnerabilities in de-identified datasets.
- Differential privacy techniques.
- Distinguishes MI attacks by their focus on exploiting confidence scores instead of data.
8. Conclusion
- Impact:
- MI attacks using confidence scores present significant privacy risks, especially in sensitive applications.
- White-box attacks pose greater risks due to access to training details.
- Future Work:
- Develop comprehensive countermeasures combining heuristic and theoretical approaches.
- Explore methods to prevent MI attacks without degrading model performance.
Critical Takeaways
- MI Attacks Exploit Confidence Scores: Adversaries can infer sensitive information or reconstruct identifiable images by leveraging confidence information from ML APIs.
- White-Box vs. Black-Box Risks: White-box attacks are more effective but require greater access to model details.
- Countermeasures Are Feasible:
- Simple techniques like confidence rounding and feature depth adjustments can mitigate risks.
- Future methods need to balance privacy with utility.
References
eof